After understanding the correlation between nutrient and diabetes
prevalence, I proceed to explore the purchase pattern in different areas
of London. In this part I want to learn if the purchasing pattern of
groceries in terms of London varies, and if the presumed variation
exhibits any impact of diabetes prevalence.
Purchasing patter in regards of nutrient
As there are over 800 wards in London, it would be overwhelming to
show the purchase pattern. Therefore, I decided to step back to the
borough level to visualise the purchase pattern regarding each nutrient
to establish relationship between purchase pattern and diabete
prevalence.
#stacked bar chart
full_join(borough_gdf, grocery_borough, by = c("GSS_CODE" = "area_id"))%>%
select(NAME, fat, saturate, salt, sugar, protein)%>%
#changing the data from tidy to long
pivot_longer(!NAME, names_to = "nutrient")%>%
ggplot(aes(y=reorder(NAME, value), x=value, fill=nutrient, label = value))+
geom_col()+
labs(
title = "Purchase Pattern in Each Borough in London",
x = "weight of nutrients (gram)",
y = "borough",
caption = "Data from Aiello et al. (2020) and Greater London Authority (2020)")+
scale_fill_manual(values = wes_palette("Royal2", n = 5))+
theme(legend.text = element_text(size = 8), legend.position = "top", legend.justification = "left")
Insight
While there was slight variation of purchase pattern in different
borough, they demonstrated the following trend: Sugar and fat have the
highest average weights in a product in all borough, followed by
protein, then saturate, and finally salt. As established, both sugar and
fat has a positive correlation with diabetes prevalence. Therefore, the
fact that these two nutrients occupy the most weight in an average
product posese potential concern. The nutrient compositions of groceries
are generally consistent. There are observable variation of sugar weight
in different boroughs, indicating a variation of dietary habits. It is
therefore potential to continue the analysis to understand the impact of
dietary habits on diabetes prevalence.
Relationship between sugar contents and diabetes prevalence
Since sugar has the highest correlation and highest weight, I decided
to deepen my analysis on its relationship to diabetes prevalence.
As the dataset from Aiello et al did not include the average diabetes
prevalence on a borough level, I first aggregate estimated diabetes
prevalence in each ward to borough level outside the R environment.
# exporting ward level diabetes prevalence (diabetes_sf) as csv file
diabetes_prev_borough <- read_csv("Data/diabetes_borough.csv")
head(diabetes_prev_borough, 5)
## # A tibble: 5 × 5
## ...1 gss_code name geometry estimated_diabetes_p…¹
## <dbl> <chr> <chr> <chr> <dbl>
## 1 0 E09000001 City of London POLYGON ((-0.1115… NA
## 2 1 E09000002 Barking and Dagenham MULTIPOLYGON (((0… 7.54
## 3 2 E09000003 Barnet POLYGON ((-0.1995… 6.11
## 4 3 E09000004 Bexley POLYGON ((0.12455… 6.92
## 5 4 E09000005 Brent POLYGON ((-0.1968… 8.67
## # ℹ abbreviated name: ¹estimated_diabetes_prevalence
#Bar chart with lolipop plot
right_join(grocery_borough, diabetes_prev_borough, by = c("area_id" = "gss_code"))%>%
select(name, sugar, protein, weight, estimated_diabetes_prevalence)%>%
filter(!is.na(estimated_diabetes_prevalence) & estimated_diabetes_prevalence > 0)%>%
group_by(name, estimated_diabetes_prevalence)%>%
mutate(sugar_prop = sugar/weight*100, protein_prop = protein/weight*100)%>%
ggplot()+
geom_hline(
aes(yintercept = y),
data.frame(y = c(0:4) * 2.5),
color = "lightgrey"
) +
geom_col(
aes(x = reorder(str_wrap(name, 5), estimated_diabetes_prevalence), y = estimated_diabetes_prevalence, fill=estimated_diabetes_prevalence),
position = "dodge2",
show.legend = TRUE,
alpha = 0.9,
)+
scale_fill_gradientn(
"diabete prevalence (%)",
colours = c( "#F8B195","#F67280","#C06C84", "#6C5B7B")
)+
geom_segment(
aes(
x = reorder(str_wrap(name, 5), estimated_diabetes_prevalence),
y = 0,
xend = reorder(str_wrap(name, 5), estimated_diabetes_prevalence),
yend = 10
),
linetype = "dashed",
color = "gray93"
)+
geom_segment(
aes(
x = reorder(str_wrap(name, 5), estimated_diabetes_prevalence),
y = 0,
xend = reorder(str_wrap(name, 5), estimated_diabetes_prevalence),
yend = sugar_prop
),
color = "gray12",
size = 1
)+
geom_point(
aes(x = reorder(str_wrap(name, 5), estimated_diabetes_prevalence), y = sugar_prop, color = sugar_prop),
size = 3,
)+
scale_color_gradientn("sugar weight (%)",
colours = c("#f0efeb", "#99c1de", "#000000"))+
coord_polar()+
labs(
title = "Diabetes Prevalence in Each Borough in Relationship to \nAverage Sugar Content in a Grocery Product",
caption = "Data from Aiello et al. (2020)"
)+
annotate(
x = 6.9,
y = 1.5,
label = "sugar \nproportion",
geom = "text",
angle = 7,
color = "gray12",
size = 2.3,
lineheight = 1.1
)+
annotate(
x = 7,
y = 6.3,
label = "estimated \n diabetes \nprevalence",
geom = "text",
angle = -83,
color = "gray12",
size = 2.7,
lineheight = 0.9
) +
annotate(
x = 1.5,
y = 5.0,
label = "5.0%",
geom = "text",
color = "gray12",
size = 3,
angle = -12
) +
annotate(
x = 1.5,
y =7.5,
label = "7.5%",
geom = "text",
color = "gray12",
size = 3,
angle = -12
)+
annotate(
x = 1.5,
y =10,
label = "10.0%",
geom = "text",
color = "gray12",
size = 3,
angle = -12
)+
scale_y_continuous(
limits = c(-1.5, 11),
expand = c(0, 0),
breaks = c(0, 1000, 2000, 3000)
) +
guides(
color = guide_colorsteps(
barwidth = 15, barheight = .5, title.position = "top", title.hjust = .5
),
fill = guide_colorsteps(
barwidth = 15, barheight = .5, title.position = "top", title.hjust = .5
)
) +
theme(
axis.title = element_blank(),
axis.ticks = element_blank(),
axis.text.y = element_blank(),
axis.text.x = element_text(color = "gray12", size = 9, vjust = 2),
panel.grid = element_blank(),
panel.grid.major.x = element_blank(),
legend.text = element_text(size = 8),
legend.position = "top",
legend.justification = "left",
)
Insight
From this plot, the variation of diabetes prevalence is more
observable then the proportion of sugar in the average product. However,
the general trend is the diabetes prevalence increases when the
proportion of sugar increases but there are multiple exceptions. For
instance, Sutton, the borough with highest average proportion of sugar
in a grocery product, does not have a high diabetes prevalence. Whilst,
the average proportion of sugar in Hammersmith and Fuham is similar to
that in Harrow, yet they are boroughs with the highest and lowest
diabetes prevalence respectively. Thus, this plot, while confirming
relationship between sugar content and diabetes prevalence, also
indicates the limitations of sugar content in explaining the diabetes
prevalence.